Dealing with Imbalanced Dataset Leveraging Boundary Samples Discovered by Support Vector Data Description

نویسندگان

چکیده

These days, imbalanced datasets, denoted throughout the paper by ID, (a dataset that contains some (usually two) classes where one considerably smaller number of samples than other(s)) emerge in many real world problems (like health care systems or disease diagnosis systems, anomaly detection, fraud stream based malware detection and so on) these datasets cause under-training minority class(es) over-training majority class(es), bias towards classification process application. Therefore, take focus researchers any science there are several solutions for dealing with this problem. The main aim study IDs is to resample borderline discovered Support Vector Data Description (SVDD). There naturally two kinds resampling: Under-sampling (U-S) over-sampling (O-S). O-S may occurrence over-fitting (the its drawback). U-S can significant information loss In study, avoid drawbacks sampling techniques, we on be misclassified. data points misclassified considered which border(s) between class(es). First SVDD, find examples; then, resampling applied over them. At next step, base classifier trained newly created dataset. Finally, compare result our method terms Area Under Curve (AUC) F-measure G-mean other state-of-the-art methods. We show has better results methods experimental study.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dealing with Imbalanced Data using Bayesian Techniques

For the present work, we deal with the significant problem of high imbalance in data in binary or multi-class classification problems. We study two different linguistic applications. The former determines whether a syntactic construction (environment) co-occurs with a verb in a natural text corpus consists a subcategorization frame of the verb or not. The latter is called Name Entity Recognitio...

متن کامل

Class-Boundary Alignment for Imbalanced Dataset Learning

In this paper, we propose the class-boundaryalignment algorithm to augment SVMs to deal with imbalanced training-data problems posed by many emerging applications (e.g., image retrieval, video surveillance, and gene profiling). Through a simple example, we first show that SVMs can be ineffective in determining the class boundary when the training instances of the target class are heavily outnum...

متن کامل

High performance of the support vector machine in classifying hyperspectral data using a limited dataset

To prospect mineral deposits at regional scale, recognition and classification of hydrothermal alteration zones using remote sensing data is a popular strategy. Due to the large number of spectral bands, classification of the hyperspectral data may be negatively affected by the Hughes phenomenon. A practical way to handle the Hughes problem is preparing a lot of training samples until the size ...

متن کامل

Ellipse Support Vector Data Description

This paper presents a novel Boundary-based approach in one-class classification that is inspired by support vector data description (SVDD). The SVDD is a popular kernel method which tries to fit a hypersphere around the target objects and of course more precise boundary is relied on selecting proper parameters for the kernel functions. Even with a flexible Gaussian kernel function, the SVDD cou...

متن کامل

Subspace Support Vector Data Description

This paper proposes a novel method for solving oneclass classification problems. The proposed approach, namely Subspace Support Vector Data Description, maps the data to a subspace that is optimized for one-class classification. In that feature space, the optimal hypersphere enclosing the target class is then determined. The method iteratively optimizes the data mapping along with data descript...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computers, materials & continua

سال: 2021

ISSN: ['1546-2218', '1546-2226']

DOI: https://doi.org/10.32604/cmc.2021.012547